library(tidyverse)
library(tigris)
library(censusapi)
library(sf)
library(mapview)
library(plotly)

options(
  tigris_class = "sf",
  tigris_use_cache = T # This stores tigris loads somewhere on your machine for much faster personal loading.
)

This script uses an output from load_safegraph.Rmd which processes down the daily social distancing dataset from Safegraph to just block groups in the Bay Area.

Loading geometries

These were the high-level lists of Bay county names, county shapes, and block group IDs from the other script, in case they become useful again.

bay_county_names <-
  c(
    "Alameda",
    "Contra Costa",
    "Marin",
    "Napa",
    "San Francisco",
    "San Mateo",
    "Santa Clara",
    "Solano",
    "Sonoma"
  )

bay_blockgroups <-
  bay_county_names %>% 
  map(function(x){
    block_groups("CA",x,progress_bar=F) %>% 
      pull(GEOID)
  }) %>% unlist()

bay_counties <-
  counties("CA", cb = F, progress_bar=F) %>% 
  filter(NAME %in% bay_county_names)

Also, as part of starting to focus on one of our partners, SJ, here’s a quick way to get all the block groups in SJ. If you want to practice this analysis with another location, or see a specific opportunity to support another partner, then this is where you’d start modifications.

scc_blockgroups <-
  block_groups("CA","Santa Clara", cb=F, progress_bar=F)

# Below uses tracts sent to us by San Jose
sj_tracts <- st_read("/users/ctenner/Pcloud Drive/SFBI/Data Library/San_Jose/CSJ_Census_Tracts/CSJ_Census_Tracts.shp") %>% 
  st_as_sf() %>% 
  st_transform(4269)
## Reading layer `CSJ_Census_Tracts' from data source `/Users/ctenner/pCloud Drive/SFBI/Data Library/San_Jose/CSJ_Census_Tracts/CSJ_Census_Tracts.shp' using driver `ESRI Shapefile'
## Simple feature collection with 219 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 6112856 ymin: 1869687 xmax: 6255982 ymax: 1996555
## epsg (SRID):    2227
## proj4string:    +proj=lcc +lat_1=38.43333333333333 +lat_2=37.06666666666667 +lat_0=36.5 +lon_0=-120.5 +x_0=2000000.0001016 +y_0=500000.0001016001 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
sj_citycouncil_disticts <- st_read("/Users/ctenner/Desktop/Classes/118/City Council Districts/CITY_COUNCIL_DISTRICTS.shp") %>% 
  st_as_sf() %>% 
  st_transform(4269)
## Reading layer `CITY_COUNCIL_DISTRICTS' from data source `/Users/ctenner/Desktop/Classes/118/City Council Districts/CITY_COUNCIL_DISTRICTS.shp' using driver `ESRI Shapefile'
## Simple feature collection with 10 features and 7 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 6112856 ymin: 1869687 xmax: 6255982 ymax: 1996555
## epsg (SRID):    2227
## proj4string:    +proj=lcc +lat_1=38.43333333333333 +lat_2=37.06666666666667 +lat_0=36.5 +lon_0=-120.5 +x_0=2000000.0001016 +y_0=500000.0001016001 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
sj_blockgroups <- 
  scc_blockgroups %>% 
  st_centroid() %>% 
  st_join(sj_tracts, left = F) %>% 
  st_join(sj_citycouncil_disticts%>% dplyr::select(DISTRICTS)) %>% 
  st_set_geometry(NULL) %>% 
  left_join(scc_blockgroups%>% dplyr::select(GEOID), by = "GEOID") %>% 
  st_as_sf() %>% 
  dplyr::select(GEOID, DISTRICTS)

# the spatial join leaves off two blockgroups which are touching district 9. The following code assigns those to district 9
sj_blockgroups$DISTRICTS[is.na(sj_blockgroups$DISTRICTS)] <- 9

sj_boundary <- 
  places("CA", cb=F, progress_bar=F) %>% 
  filter(NAME == "San Jose")

#sj_blockgroups <-
#  scc_blockgroups %>% 
#  dplyr::select(GEOID) %>%
#  st_join(sj_boundary %>% dplyr::select(geometry), left = F) #%>%
  # st_set_geometry(NULL) %>%
  # left_join(scc_blockgroups %>% dplyr::select(GEOID)) %>% 
  # st_as_sf() # last lines here not necessary because we didn't convert to centroid. thanks for the catch, cameron!

mapview(sj_blockgroups, zcol = "DISTRICTS")+mapview(sj_boundary,alpha.region= 0, color = "red", lwd = 4)

Note that in this case, unlike the R Basics example, I don’t convert to centroid before the st_join(), because there are random holes within SJ official geopolitical boundaries (whole different story) that certainly shouldn’t be removed from our analysis. So the st_join() holds onto anything that even touches the SJ boundary. But note there’s a huge rural block group to the east of SJ that comes along for the ride that’s practically the size of SJ, which you may decide to manually remove for the sake of better visualiation for now.

sj_blockgroups <-
  sj_blockgroups %>% 
  filter(!GEOID %in% c("060855135001"))

mapview(sj_blockgroups)

Loading social distancing data

Loading Bay social distancing data will still take some time, but much faster than loading the whole US. We’ll quickly create a SJ version too. When you open this file, you’ll probably see that the first few steps have been commented out to reduce knitting time, but you should practice uncommenting and running all to understand how it works.

Now’s a good time to also remind you that per your data sharing agreement, raw Safegraph data like this should definitely stay within the F Drive Restricted Data Library. You’ll see at the end the level of aggregation that is then OK to save outside of the F Drive. The gray area in-between will come down to teaching team judgment.

As you start to save RDS files, be very careful about overwriting existing files. We’re going to be backing things up a lot, but just be aware and definitely notify us if you think something has gone wrong.

# bay_socialdistancing <-
#   readRDS("P:/SFBI/Restricted Data Library/Safegraph/covid19analysis/bay_socialdistancing.rds")
# 
# sj_socialdistancing <-
#   bay_socialdistancing %>%
# filter(origin_census_block_group %in% sj_blockgroups$GEOID)
# 
# saveRDS(sj_socialdistancing, file = "P:/SFBI/Restricted Data Library/Safegraph/covid19analysis/sj_socialdistancing.rds")
sj_socialdistancing <- readRDS("/users/ctenner/Pcloud Drive/SFBI/Restricted Data Library/Safegraph/covid19analysis/sj_socialdistancing.rds")

We will eventually join the Safegraph data to our geographies, but right now since the data includes many rows for each block group (individual days), this is not a good time to do it.

Analyzing social distance data

We invite you to explore this dataset in detail, referring to the online explainer to understand the fields. The following is a simple exercise to get the number of devices staying “completely home” each day, divide it by the total devices, aggregate that information up to the SJ level, and then plot as a time series. We’ll also plot the average for the last three days as a map.

# helps to quickly check if fields have missing data before you do math.
# sum(is.na(sj_socialdistancing$completely_home_device_count))
# sum(is.na(sj_socialdistancing$device_count))

sj_percenthome_bg <-
  sj_socialdistancing %>%
  mutate(
    `% Completely at Home` = completely_home_device_count/device_count,
    date = date_range_start %>%  substr(1,10) %>% as.Date()
  ) %>% 
  dplyr::select(origin_census_block_group,date,completely_home_device_count,device_count,`% Completely at Home`)

sj_percenthome_time <-
  sj_percenthome_bg %>% 
  group_by(date) %>% 
  summarize(
    completely_home_device_count = sum(completely_home_device_count),
    device_count = sum(device_count)
  ) %>% 
  mutate(`% Completely at Home` = (completely_home_device_count/device_count*100) %>% round(1))

chart <-
  sj_percenthome_time %>% 
  ggplot(
    aes(
      x = date,
      y = `% Completely at Home`
    )
  ) + 
  geom_line() +
  labs(
    x = "Date",
    y = "Percent staying completely at home",
    title = "San Jose, CA"
  )

chart %>% ggplotly()

We can add the non-pharmaceutical interventions (NPIs) in the graph. Here’s info about SCC’s shelter in place order, 3/16/20.

chart <-
  chart +
  geom_vline(
    aes(
      xintercept = "2020-03-16" %>% as.Date() %>% as.numeric() # ggplotly needs vlines to be numeric
    ), 
    linetype="dotted"
  ) +
  annotate("text",label= "Santa Clara County\nShelter-in-Place Order", color = "black", x=(("2020-03-16" %>% as.Date())-8), y=55, size=2)

chart %>% ggplotly()
sj_percenthome_map <-
  sj_percenthome_bg %>% 
  filter(date > max(date)-3) %>% 
  group_by(origin_census_block_group) %>%
  summarize(`% Completely at Home` = round(mean(`% Completely at Home`)*100,1)) %>% 
  left_join(sj_blockgroups, by = c("origin_census_block_group" = "GEOID")) %>% 
  st_as_sf()

mapview(sj_percenthome_map, zcol = "% Completely at Home", layer.name = paste0("% Completely<br>at Home,<br>",max(sj_percenthome_bg$date)-3," to<br>",max(sj_percenthome_bg$date)))

What other kinds of actionable insights, based on your observations of needs on the ground in SJ and ultimately engagement of our stakeholders, can you create out of this dataset?

NPIs

Susan Athey from the GSB has been working on a database of non-pharmaceutical interventions, and it looks robust for the Bay Area. See this main page and the Github repo. The following code loads the Santa Clara County data (you’ll need to clone their repo).

opts_knit$set(root.dir = "/users/ctenner/Desktop/Classes/118/GitHub/covid19-intervention-data") 
npis <- read_csv("npis_raw_03-24-2020.csv")

npis_scc <- 
  npis %>% 
  filter(locality == "santa_clara_county") %>% 
  mutate(
    date = data_start %>% as.Date("%m/%d/%Y")
  ) %>% 
  dplyr::select(date,type_of_intervention,further_notes_comments) %>% 
  arrange(date)

kable(
  npis_scc,
  caption = 'Santa Clara County Non-Pharmaceutical Interventions'
)
Santa Clara County Non-Pharmaceutical Interventions
date type_of_intervention further_notes_comments
2020-01-26 HE Virus prevention education published in response to early cases in CA
2020-01-28 CI Adult male who had traveled to Wuhan, China and had been self-isolating at home
2020-02-03 TR_China Nation-wide restriction of foreigners who have traveled to China in the past 14 days
2020-03-01 HE County issues health recommendations for prevention
2020-03-02 SOE CA Governor requests money from the Disaster Response Emergency Operations Account to fight COVID-19
2020-03-04 SOE CA Governor declares state of emergency
2020-03-04 PL Paid leave policies put in place
2020-03-05 SDO Recommendation that persons at higher risk of severe illness should stay home and away from crowded social gatherings
2020-03-09 SDO Social distancing for vulnerable/at-risk populations
2020-03-11 SOE NA
2020-03-11 GS_1000 NA
2020-03-13 GS_250 Applies to gambling venues, theme parks, theaters
2020-03-13 PC Guidelines for physical closures of schools
2020-03-14 GS_100 NA
2020-03-16 PC NA
2020-03-16 CPV Restrict dine-in options for restaurants/bars
2020-03-17 NESC NA
2020-03-17 SD NA
2020-03-19 SD State-wide issue of “stay at home” order
2020-03-19 NESC Non-essential services closed
2020-03-20 CPV All city playgrounds closed

Some type_of_intervention codes are defined on the Github, but some aren’t (I’ve reached out to them to ask).

The NPIs defined are:

You can see the 3/17 shelter in place order coded as SD. I’m sharing this as resource for you to consider how to use. In theory each of these can be added to the social distancing time series as a vertical line. Comparisons could be made between the social distancing metrics of different counties based on the different times they implemented various NPIs, and could be useful to communicate to SJ as well as other partners.